NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Auditing Privacy Mechanisms via Label Inference Attacks

Busa-Fekete, Robert; Dick, Travis; Gentile, Claudio; Muñoz-Medina, Andrés; Smith, Adam D; Swanberg, Marika (December 2024, Advances in Neural Information Processing Systems (NeurIPS) 2024)

Full Text Available
How Much Data Is Sufficient to Learn High-Performing Algorithms?

https://doi.org/10.1145/3676278

Balcan, Maria-Florina; Deblasio, Dan; Dick, Travis; Kingsford, Carl; Sandholm, Tuomas; Vitercik, Ellen (October 2024, Journal of the ACM)

Algorithms often have tunable parameters that impact performance metrics such as runtime and solution quality. For many algorithms used in practice, no parameter settings admit meaningful worst-case bounds, so the parameters are made available for the user to tune. Alternatively, parameters may be tuned implicitly within the proof of a worst-case approximation ratio or runtime bound. Worst-case instances, however, may be rare or nonexistent in practice. A growing body of research has demonstrated that a data-driven approach to parameter tuning can lead to significant improvements in performance. This approach uses atraining setof problem instances sampled from an unknown, application-specific distribution and returns a parameter setting with strong average performance on the training set. We provide techniques for derivinggeneralization guaranteesthat bound the difference between the algorithm’s average performance over the training set and its expected performance on the unknown distribution. Our results apply no matter how the parameters are tuned, be it via an automated or manual approach. The challenge is that for many types of algorithms, performance is a volatile function of the parameters: slightly perturbing the parameters can cause a large change in behavior. Prior research [e.g.,12,16,20,62] has proved generalization bounds by employing case-by-case analyses of greedy algorithms, clustering algorithms, integer programming algorithms, and selling mechanisms. We streamline these analyses with a general theorem that applies whenever an algorithm’s performance is a piecewise-constant, piecewise-linear, or—more generally—piecewise-structuredfunction of its parameters. Our results, which are tight up to logarithmic factors in the worst case, also imply novel bounds for configuring dynamic programming algorithms from computational biology.
more » « less
Full Text Available
Learning to Branch: Generalization Guarantees and Limits of Data-Independent Discretization

https://doi.org/10.1145/3637840

Balcan, Maria-Florina; Dick, Travis; Sandholm, Tuomas; Vitercik, Ellen (April 2024, Journal of the ACM)

Tree search algorithms, such as branch-and-bound, are the most widely used tools for solving combinatorial and non-convex problems. For example, they are the foremost method for solving (mixed) integer programs and constraint satisfaction problems. Tree search algorithms come with a variety of tunable parameters that are notoriously challenging to tune by hand. A growing body of research has demonstrated the power of using a data-driven approach to automatically optimize the parameters of tree search algorithms. These techniques use atraining setof integer programs sampled from an application-specific instance distribution to find a parameter setting that has strong average performance over the training set. However, with too few samples, a parameter setting may have strong average performance on the training set but poor expected performance on future integer programs from the same application. Our main contribution is to provide the firstsample complexity guaranteesfor tree search parameter tuning. These guarantees bound the number of samples sufficient to ensure that the average performance of tree search over the samples nearly matches its future expected performance on the unknown instance distribution. In particular, the parameters we analyze weightscoring rulesused for variable selection. Proving these guarantees is challenging because tree size is a volatile function of these parameters: we prove that, for any discretization (uniform or not) of the parameter space, there exists a distribution over integer programs such that every parameter setting in the discretization results in a tree with exponential expected size, yet there exist parameter settings between the discretized points that result in trees of constant size. In addition, we provide data-dependent guarantees that depend on the volatility of these tree-size functions: our guarantees improve if the tree-size functions can be well approximated by simpler functions. Finally, via experiments, we illustrate that learning an optimal weighting of scoring rules reduces tree size.
more » « less
Full Text Available
Confidence-ranked reconstruction of census microdata from published statistics

https://doi.org/10.1073/pnas.2218605120

Dick, Travis; Dwork, Cynthia; Kearns, Michael; Liu, Terrance; Roth, Aaron; Vietri, Giuseppe; Wu, Zhiwei Steven (February 2023, Proceedings of the National Academy of Sciences)

A reconstruction attack on a private dataset D takes as input some publicly accessible information about the dataset and produces a list of candidate elements of D . We introduce a class of data reconstruction attacks based on randomized methods for nonconvex optimization. We empirically demonstrate that our attacks can not only reconstruct full rows of D from aggregate query statistics Q ( D )∈ℝ m but can do so in a way that reliably ranks reconstructed rows by their odds of appearing in the private data, providing a signature that could be used for prioritizing reconstructed rows for further actions such as identity theft or hate crime. We also design a sequence of baselines for evaluating reconstruction attacks. Our attacks significantly outperform those that are based only on access to a public distribution or population from which the private dataset D was sampled, demonstrating that they are exploiting information in the aggregate statistics Q ( D ) and not simply the overall structure of the distribution. In other words, the queries Q ( D ) are permitting reconstruction of elements of this dataset, not the distribution from which D was drawn. These findings are established both on 2010 US decennial Census data and queries and Census-derived American Community Survey datasets. Taken together, our methods and experiments illustrate the risks in releasing numerically precise aggregate statistics of a large dataset and provide further motivation for the careful application of provably private techniques such as differential privacy.
more » « less
Full Text Available
Learning piecewise Lipschitz functions in changing environments

Balcan, Maria-Florina; Dick, Travis; Sharma, Dravyansh (August 2020, Proceedings of Machine Learning Research)
null (Ed.)
Full Text Available
Semi-bandit Optimization in the Dispersed Setting

Balcan, Maria-Florina; Dick, Travis; Pegden, Wesley (August 2020, Uncertainty in artificial intelligence)
null (Ed.)
Full Text Available
Learning to Link

Balcan, Maria-Florina; Dick, Travis; Lang, Manuel (April 2020, International Conference on Learning Representation)
null (Ed.)
Full Text Available
Algorithms and Learning for Fair Portfolio Design

Diana, Emily; Dick, Travis; Elzayn, Hadi; Kearns, Michael; Roth, Aaron; Schutzman, Zachary; Sharifi-Malvajerdi, Saeed; Ziani, Juba (January 2021, Economics and Computation (EC) 2021)

We consider a variation on the classical finance problem of optimal portfolio design. In our setting, a large population of consumers is drawn from some distribution over risk tolerances, and each consumer must be assigned to a portfolio of lower risk than her tolerance. The consumers may also belong to underlying groups (for instance, of demographic properties or wealth), and the goal is to design a small number of portfolios that are fair across groups in a particular and natural technical sense. Our main results are algorithms for optimal and near-optimal portfolio design for both social welfare and fairness objectives, both with and without assumptions on the underlying group structure. We describe an efficient algorithm based on an internal two-player zero-sum game that learns near-optimal fair portfolios ex ante and show experimentally that it can be used to obtain a small set of fair portfolios ex post as well. For the special but natural case in which group structure coincides with risk tolerances (which models the reality that wealthy consumers generally tolerate greater risk), we give an efficient and optimal fair algorithm. We also provide generalization guarantees for the underlying risk distribution that has no dependence on the number of portfolios and illustrate the theory with simulation results.
more » « less
Full Text Available
Random Smoothing Might be Unable to Certify L_infinity Robustness for High-Dimensional Images

Blum, Avrim; Dick, Travis; Manoj, Naren; Zhang, Hongyang (January 2020, Journal of machine learning research)
null (Ed.)
We show a hardness result for random smoothing to achieve certified adversarial robustness against attacks in the ℓp ball of radius ϵ when p>2. Although random smoothing has been well understood for the ℓ2 case using the Gaussian distribution, much remains unknown concerning the existence of a noise distribution that works for the case of p>2. This has been posed as an open problem by Cohen et al. (2019) and includes many significant paradigms such as the ℓ∞ threat model. In this work, we show that any noise distribution D over R^d that provides ℓp robustness for all base classifiers with p>2 must satisfy E[η_i^2]= Ω(d^(1−2/p) ϵ^2 (1−δ)/δ^2) for 99% of the features (pixels) of vector η∼D, where ϵ is the robust radius and δ is the score gap between the highest-scored class and the runner-up. Therefore, for high-dimensional images with pixel values bounded in [0,255], the required noise will eventually dominate the useful information in the images, leading to trivial smoothed classifiers.
more » « less
Full Text Available
Random Smoothing Might be Unable to Certify L_infinity Robustness for High-Dimensional Images

Blum, Avrim; Dick, Travis; Manoj, Naren; Zhang, Hongyang (January 2020, Journal of machine learning research)
null (Ed.)
We show a hardness result for random smoothing to achieve certified adversarial robustness against attacks in the ℓp ball of radius ϵ when p>2. Although random smoothing has been well understood for the ℓ2 case using the Gaussian distribution, much remains unknown concerning the existence of a noise distribution that works for the case of p>2. This has been posed as an open problem by Cohen et al. (2019) and includes many significant paradigms such as the ℓ∞ threat model. In this work, we show that any noise distribution D over Rd that provides ℓp robustness for all base classifiers with p>2 must satisfy E[η_i^2]=Ω(d^(1−2/p) ϵ^2(1−δ)/δ^2) for 99% of the features (pixels) of vector η∼D, where ϵ is the robust radius and δ is the score gap between the highest-scored class and the runner-up. Therefore, for high-dimensional images with pixel values bounded in [0,255], the required noise will eventually dominate the useful information in the images, leading to trivial smoothed classifiers.
more » « less
Full Text Available

« Prev Next »

Search for: All records